Conditionally linear Gaussian models for estimating vocal tract resonances
نویسندگان
چکیده
Vocal tract resonances play a central role in the perception and analysis of speech. Here we consider the canonical task of estimating such resonances from an observed acoustic waveform, and formulate it as a statistical model-based tracking problem. In this vein, Deng and colleagues recently showed that a robust linearization of the formant-to-cepstrum map enables the effective use of a Kalman filtering framework. We extend this model both to account for the uncertainty of speech presence by way of a censored likelihood formulation, as well as to explicitly model formant cross-correlation via a vector autoregression, and in doing so retain a conditionally linear and Gaussian framework amenable to efficient estimation schemes. We provide evaluations using a recently introduced public database of formant trajectories, for which results indicate improvements from twenty to over 30% per formant in terms of root mean square error, relative to a contemporary benchmark formant analysis tool.
منابع مشابه
On instantaneous vocal tract length estimation from formant frequencies
The length of the vocal tract and its relationship with formant frequencies is examined at fine temporal scales with the goal of providing accurate estimates of vocal tract length from acoustics on a spectrum-by-spectrum basis despite unknown articulatory information. Accurate vocal tract length estimation is motivated by applications to speaker normalization and biometrics. Analyses presented ...
متن کاملConditional Dependence in Longitudinal Data Analysis
Mixed models are widely used to analyze longitudinal data. In their conventional formulation as linear mixed models (LMMs) and generalized LMMs (GLMMs), a commonly indispensable assumption in settings involving longitudinal non-Gaussian data is that the longitudinal observations from subjects are conditionally independent, given subject-specific random effects. Although conventional Gaussian...
متن کاملContinuous Voice Morphing Using Separated Vocal Tract Area Functions and Glottal Source Waves
This paper presents a flexible voice morphing method, which is based on a conversion using a linear combination of the vocal tract area functions estimated from speech signals. The method focuses on the continuity of the phonological identity of the overall interpolated area. The main features of the method are 1) to separate characteristics of the vocal tract resonances from those of glottal s...
متن کاملRobust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features
This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned ...
متن کاملA novel instrument to measure acoustic resonances of the vocal tract during phonation
Acoustic resonances of the vocal tract give rise to formants (broad bands of acoustic power) in the speech signal when the vocal tract is excited by a periodic signal from the vocal folds. This paper reports a novel instrument which uses a real-time, non-invasive technique to measure these resonances accurately during phonation. A broadband acoustic current source is located just outside the mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007